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Abstract. We study analytically the performance of a recently proposed algorithm 
for learning the couplings of a random asymmetric kinetic Ising model from finite 
length trajectories of the spin dynamics. Our analysis shows the importance of the 
nontrivial equal time correlations between spins induced by the dynamics for the speed 
of learning. These correlations become more important as the spin’s stochasticity is 
decreased. We also analyse the deviation of the estimation error from asymptotic 
optimality. 


1. Introduction 

Recently, the learning of synaptic conplings for a recurrent neural network modelled by 
a kinetic Ising model with random couplings has attracted attention in the statistical 
physics community, see e.g [1-10]. The model is dehned by a system of N Ising spins at 
connected through couplings Jij. We assume throughout the paper that the interactions 
are non-symmetric, i.e. we have Jij ^ Jji and Ju = 0. The system evolves in discrete 
time according to a synchronous parallel dynamics, where spins at time t + 1 are updated 
independently with transition probability (specialised on the case of no external helds) 

2 cosh(/3 - 1)) ’ 

We are interested in learning the spin couplings Jij, assuming that a complete trajectory 
{cr}o:T = Wi{t)}i=i,...,N,t=i,...,T of length T for all spins is observed. A well known solution 
to this problem is given by the method of maximum likelihood, which leads to a set of 
coupled nonlinear equations which have to be solved by iteration. A computationally 
much simpler and elegant solution valid for large networks with random couplings which 
avoids an iterative solution was recently presented in [1]. This solution is based on an 
exact mean held (EMF) expression for spin correlations which can be explicitly solved for 
the couplings. The EMF estimator replaces exact correlations by empirical correlations 
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which can e.g. be computed from a single spin trajectory. Simulations have shown good 
agreement between true and estimated couplings [1]. 

Of course, if there is only a limited number of observations available there will be a 
nonzero estimation error for the EMF method. One may then ask how much one has to 
pay for the numerical efficiency of the algorithm in terms of a loss in statistical efficiency. 
Hence, we would like to investigate at what rate the error decreases with growing length 
of trajectories and if the decrease is slower than that of a statistically efficient estimator 
such as the maximum likelihood estimator which has an optimal asymptotic rate [11]. 
Using the replica method we will compute the estimation error of the EMF method in 
the thermodynamic limit N ^ oo assuming that the data are generated from a kinetic 
Ising model with true couplings drawn at random from a Gaussian distribution. The 
analysis of the statistical properties is significantly simplified by the fact that kinetic 
Ising models with non-symmetric random couplings have spin correlations which decay 
after a single time step (see for example [12]) and computations of learning curves 
resemble those for temporally independent data. A nontrivial aspect however is the 
occurrence of equal time spin correlations of the spin dynamics. We compute an exact 
result for the statistics of the random correlation matrix. From this it is possible to 
obtain an explicit expression for the learning curve for the EMF algorithm and the 
asymptotics of the ML estimator. 


2. Estimators 


The EMF estimator [1] is based on a linear relation between the time-delayed and the 
equal time correlator matrices. 


Cij = {Sai(t)Saj(t)), Dij = {5ai{t + l)5aj{f)), (2) 


for the spin fluctuations Sajit) = (Jj{f) — mjit), where mj(t) denotes the local 
magnetisation at time t and the brackets (...) denote expectation with respect to 
the spin dynamics (1). Here we assume stationarity for which the matrices are time 
independent. If the couplings are assumed to be mutually independent Gaussian 
random variables, with zero meand and variance 1/A^, the following mean field relation 
is found to be exact in the thermodynamic limit N ^ oo : 


Ldij a^i ^ ^ Ji] 

k 


ik ^kj 1 


(3) 


where 


ai = /3 


Vx 


1-tanh2[/3(iL"^* + a;v/^)] , 


A. = ^Jj(l-mf) (4) 

j 


and Vx is the normal Gaussian measure. Throughout the paper we will specialise to 
the case of zero external field and vanishing initial magnetisations. In this case we have 
mi{t) = 0, iL®** = 0, Aj = 1 and a* = a is independent of time. For the estimator the 
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exact correlation matrices C and D are approximated by empirical averages using a 
long trajectory of spins (assuming zero magnetisations): 

1 T 1 ^ 

Cij —)■ Cij = — ^^ai{t)aj{t), Dij —)■ Dij = — ^^(Ji[t + l)aj{t). ( 5 ) 

t=i t=i 

One can then obtain the couplings by inverting (3) as follows: 

Jij = — ■ ( 6 ) 

® k 

It is easy to see that the EMF estimator can be rephrased as the minimiser of the 
following cost function 

^MF = b f ® 

^ t=i V j 

with respect to the couplings Note that the estimation of the ingoing couplings 

for each spin i can be treated separately for the coupling distribution we are 
considering. The EMF estimator is based on simple explicit computation (inversion 
of the correlation matrix in 6, which is possible if the parameter a = T/N is grater 
than 1) which makes the method fast. Other estimators such as the well known 
maximum likelihood method (ML) have to resort to numerical optimisations using 
iterative algorithms which could become computationally involved for large system sizes 
N and a large number of data T. The ML estimator maximises the probability of spin 
histories {cr}o:T given by 

N T 

i=l t=l 

where P((j(0)) is the initial probability of spins. Since this probability factorises in the 
spins i and Jij are assumed independent, the ML estimator for all couplings 
pointing into spin i minimises the cost function 

Eml = - 1) + ln2cosh(/3 ^ Jij(Tj{t - 1)) j . (9) 

t=i \ j j J 

While minimizing the cost function (7) just requires the computation of the empirical 
averages C and D, in order to minimize (9) with respect to Jij one needs to compute the 
quantity Xltexplicitely depends on the current value of 
Jij and has to be recomputed at each step of the algorithm, adding a Wtep • T operation 
to the calculation. We observe that in order to avoid second order methods in the 
solution we need a hne tuning of the step size which makes the algorithm fairly slow 
for large N. Although it is more computationally expensive, the ML estimator has the 
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important property that it is asymptotically (i.e. for T — )■ cxd) efficient. This means that 
the asymptotic convergence of the mean squared estimation error to zero (assuming the 
model is correct) happens at a rate which is minimal for any (asymptotically) unbiased 
estimator [11], In the following we will compute the error of the EMF algorithm in 
the thermodynamic limit iV, T —)■ oo, keeping a hxed and compare with the asymptotic 
a ^ oo optimal error rate of the ML estimator. 

3. Learning curves from the replica approach 

In this section we will introduce the replica method for computing the EMF prediction 
error as a function of the scaled number of observed data. We will work in a teacher- 
student scenario [13, 14], where the data are assumed to be generated at random 
from the dynamics of a teacher network with random couplings J*y We will use the 
scaling J*j = and assume that the W*j are independent Gaussian random 

variables with W*j ~ A/’(0,1). We can treat the estimation of the ingoing couplings 
W* = for each spin i separately. For the sake of simplicity, in the following 

we will drop the index i and dehne Wj = Wij. The average square prediction error for 
any estimator of the couplings given by W is dehned as 

e = ^\\W* -W\\^ = l-2p + Q, (10) 

where we dehned 

p = N-^W* ■ W, Q = N-^]\W\^. (11) 

The bar denotes an average over the spin trajectories {cr}o:T generated with couplings 
W* and over the teacher couplings. We will now analyse the performance of algorithms 
which minimise a cost function of the type 

^ ht = ^'^Wjaj{t-1), ( 12 ) 

i=l j 

such as (7) and (9), on a random hnite set of spin trajectories of size T. One can 
compute average properties such as the order parameters p and Q by introducing an 
auxiliary probability density of couplings, 

q{W) = i (13) 

with a formal inverse ’temperature’ parameter v and the partition function 

Z{(t) = j (14) 

For any z/, we can compute disorder averages of ’thermal averages’ of variables such as 

p and Q from the quenched average of the free energy per coupling, dehned by 

_ f) _ 

F = —N~^v~^\ogZ{<T) = —lim —N~^ log Z”(cr). 

n^>o dn 


( 15 ) 
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By taking finally the limit n —>■ oo (zero ’temperature’), the probability density (13) 
concentrates at the minimum of E{W) and we can extract the desired order parameters. 
To compute the average, we will make the following assumptions. While the spins (Ji{t) 
are still treated as binary random variables, in computing expectations over Uj{t) for 
j ^ i we assume a central limit theorem to be valid for the helds ht as sums of a large 
number of weakly dependent random variables. Hence, we consider only the second 
order statistics of these variables and treat them as Gaussian random variables. For 
equal times the corresponding Gaussian density would be p{{o'j(t)}jjLi) = J\f{0,C), 
where the stationary covariance matrix (7 is a random matrix which itself depends 
on the random matrix of teacher couplings W* of the entire network. For different 
times t 7 ^ t' , dependencies between spins Cjit) and Ckit') are neglected. This is in 
accordance with our previous assumptions for \t — t'\ > 1, but we need an extra 
argument to justify neglecting Djk giving the correlations at times t and t + 1. In 
principle, D might enter the computation of order parameters as well. (3) shows a 
relation between the D and C matrices involving the teacher couplings linearly. The 
arguments presented later in section 4 indicate that for the asymptotic random matrix 
calculations involving similar relations we can treat teacher couplings and random 
matrices C as asymptotically independent. Hence, we argue that in an expectation 
over teacher couplings the contributions due to D vanish. We will see later that the 
statistical properties of the matrix C will enter the hnal result of the learning curve 
through the self averaging moment C^i = ^TrC~^. We will then show in section 4 
how this and other moments can be computed. Thus we will include the average over 
the teacher couplings Wkj ioi k ^ i in the statistics of C, but we need to perform 
the average over the teacher couplings W* = W*j pointing to spin i explicitly. Finally, 
the dependencies between random correlation matrices C at different times are also 
neglected for N ^ oo. This results in an effective statistical weight over spin histories 
given by 


P{ct) ~ / dW*e-2 




n 

t=i 




2 cosh 






( 16 ) 


where the Gaussian measure accounts for our prior knowledge on the teacher couplings 
distribution. Hence, for large N, we are effectively dealing with the statistical mechanics 
of a learning problem for a binary classiher neural network (aka logistic regression), 
where the ’input’ data aj(t — l) are used to predict the ’outputs’ Ciit)] the input varibales 
are independent for different t, but have nontrivial ’spatial’ correlations given by the 
matrix C. The calculation of the free energy follows the steps of replica calculations for 
perceptron learning problems [13-15]. Averages over aj(t) factorize over time and can 
be expressed through Gaussian helds ha for each replicated coupling variable HG, and 
helds u = W*aj{t — 1) for the teacher. Under the replica symmetry assumption, 

which is plausible to be correct for convex cost functions, the covariances are expressed 
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by order parameters 


ij 

(haU) = 


hi) 


(hah) =-J2Wta,W^ = q a^b 


and the free energy (15) is compnted as (appendix A): 

F = -Extr,^^„t - i log(, - *) - ^rr logC 

+Q; > / VWy - - 

ao J 2cosh[/3(./l - ^t + ^y)] 


ty (. 

q y/q'- 




The limit v ^ oo will occur with go g, since the different solutions W have to 
converge to the same minimum. In this limit, keeping the quantity x = {qq — q)v finite, 
we finally get 


F = —Extr, 


q,R,x,z 


q-R^ 

2x 




^Paoiyjl-^^t+^y) 


2cosh[fiiJl-ft + j=y)] 


-^{(^Oy\fxz + y/qy) 


Remarkably, the explicit dependence of F on the correlation matrix (last term in the 
first line of equation 21) drops when taking the limit v — )■ cxd. Hence, the result we get 
for F and for the order parameters extremizing F is the same that we would get if the 
spins over which we are computing the expectations were independent and the matrix 
C was not included in the calculation. Still, the correlation matrix affects the error 
through the parameters p and Q defined in (10), which are found to be (appendix A) 

P =R, (23) 

Q^R^ + {q- fl2)lrrC=5‘, (24) 

where R and g are the order parametrs extremizing the free energy (22). Inserting the 
above equations in (10) we find the following result for the error: 

£ = 1 - 2R + g + (g - R2) (Lrr^ - 1^ . (25) 
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The last term represents the effect of the correlations of the data on the error and 
vanishes when C equals the unit matrix. This term can be shown to be positive and 
leads to an increase in error. In section 5 we will give explicit results for the error of the 
EMF algorithm. 

4. Statistics of correlation matrices 

In this section we show how one can compute the stationary value of the negative integer 
moment of the spin correlations 

C_i = lim lim —TrC“^(t), (26) 

t—>-oo N^oo N 

necessary for the estimation error (25). Here the bar denotes expectation with respect 
to independent random Gaussian couplings with zero mean and variance 1/N. Our 
analysis begins with the time evolution for the correlation matrix C{t) assuming zero 
magnetisations rnj{t) = 0. Following [1], we can assume that in the limit of large N the 
random variables gi and gj, where gi = '^k are zero mean Gaussian random 

variables with (giPj) = Ylikr^ikCki{t)Jij and {gf) = 1. An expansion with respect to 
weak correlations similar to equations (15-16) in [1] yields the time evolution 

C{t + l) = I-i{t) + a^JC{t)J^, (27) 

where I is the unit matrix, (7(0) = I and J is the N x N coupling matrix. The 
selfaveraging quantity 7 must be determined such that Cuit) = 1 yielding the condition 
that 7 (t) = 1 — Tr JC(t) J"*". Since we are interested in the stationary solution (25), 
we introduce the limiting value 7 = limt^oo 7 (t) and dehne B(t) = ^Cit), obtaining 
the simplihed iteration 

t 

B{t + 1) = I + a'^JB{t)J~^, having the solution B{t) = ^ J^(j""")^. (28) 

k=0 

Note that in the limit of small /d (small a) one could choose to truncate the series in 
(28) to the hrst order in a (corresponding to k=0) and thus approximatig B by the 
unit matrix, or to keep the first two orders in a (up to k = 1) and thus getting the sum 
of the unit matrix and a Wishart matrix. From the above equations we get 7 = 

We can use (28) to derive an iteration for the generating function of integer moments. 
In the thermodynamic limit the calculation simplihes remarkably. Gonsider e.g. the 
computation of limAr^oo ;^Tr.B^(t -|- 1 ) for some integer k. One would have to deal with 
terms of the form 

^Tr( JB(t) JT JB{t)J^ ■ ■ ■ JB{t)J^). (29) 

Given the Gaussian form of the J random matrix, Wick’s theorem applies and the 
expectation in (29) can be computed using diagrammatic techniques. As is well known 
[16], for N ^ 00 only the planar diagrams, i.e. the ones for which lines are not crossing. 
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will contribute to the limit. Besides, note that in the evaluation of (29) the terms 

I-1 I-1 

containig J... J and pairings will vanish because of the asymmetry of the 

J matrix. It is easy to see (an example is given in Appendix C) that this implies that 

I- 1 I- 1 

also pairings of the kind B{t)... J and B{t)... are forbidden. Hence, in computing 
moments by iteration over time, we can treat as independent from B{f). We will 
not pursue the diagrammatic approach further but use this independence directly in the 
selfconsistent computation of the generating function S{x) of the asymptotic integer 
moments. This is given by 


S{x) = lim St{x) = y^f-x)’^Bk, 

t^oo 


k=0 


where 


^i(a;) = lim ^Tr(/ + xB{t)) \ 
N^oo iV 


(30) 


(31) 


Bk = lim lim —TrB^(t). 

t^oo N^oo N 

Finally, from S{x) we can also deduce (26) 


C_i = — lim xS{x). 

'y X^OO 


(32) 


We use an expression for St{x) based on the Gaussian ensemble of auxiliary N- 
dimensional vectors y. This is dehned by the partition function 

1 


Zt+i{x) = / JJdi/iexp 

i 

In dvi exp 


{I + xB{t + l))y 


1 \ T 

--[i + x)y y-^ 


y^JB{t)J'y 


(33) 


from which the generating function is obtained as 






(34) 


where the brackets denote expectation wrt to (33). We compute the average over random 
matrices J, using the fact that we can neglect the dependency between the random 
matrices J and B{t) in the partition function (33). An annealed average of (33) and 
the limit t —)■ cx) (appendix B) yields the self consistent equation 

1 


^ [a‘^xS{x)) . 

X ~ 1 ~ X 


(35) 


The explicit computation of moments is facilitated by introducing an auxiliary function 
0 , its power series expansion (whose coefficients are denoted by Mk) and its inverse by 


(fix) = 


2 

ax 


■s 


X 


X 


— X 


= X 


X;(-i)Va4, 


k=0 


a?yS{y) = (j) 


l + y 


(36) 

(37) 



Learning of couplings for random asymmetric kinetic Ising models revisited 
From (30), (36) and taking the limit y —)■ cxd in (37), we obtain 


9 


C'_i = 



1 

7 


k=0 


(38) 


We will next see how to obtain closed form expressions for the and recursively. 
Let us hrst show that for known values of i?i,..., Bn, we can compute Mn- From (35) 
and (36) we get the expression 


(f^x) = xS^ef^x)). (39) 

Applying Lagrange’s inversion formula [17] to (39) one can express the coefficients of 
the power series expansion of (t){x) in terms of those of S\ 

where [0"'] denotes the coefficient of 0” in a power series expansion of the mathematical 
expression in the brackets {..Finally, we insert in (40) the expansion of S (30). One 
can see that the coefficients are of the form 

Mn = Bn + fn{Bi, . . . , Bn-l), (41) 

where the functions /„ can be computed in closed form for any n with a computer 
algebra programme such as Mathematica. To obtain a relation for Bn, we expand both 
sides of (36) into powers of y. Using elementary properties of binomial coefficients and 
comparing coefficients of y” yields the second explicit relation 



Hence, inserting (41) into (42), we obtain 



Unfortunately, the series (38) turns out to be an asymptotic one. Coefficients diverge 
for n — )■ cxD and one has to use a regularisation method such as the Borel summation 
or the Fade approximation in order to extract a useful result out of a hnite number of 
coefficients. We have resorted to the latter method (appendix D). Our results obtained 
in this way are in excellent agreement with simulations of the kinetic Ising model for 
N = 200 and T = 1000. Figure 1 shows that for small values of a, i.e. small (d, the 
matrix C ~ I. For increasing fd also C_i increases but remains hnite. Note, that for 
0 —)■ oo, the parameter a converges to the value a = 
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Figure 1: The analytic result(black line) for C^i = ^TrC~^ is compared with the values 
obtained from simulation (blue line) for N = 200 and T = 1000. Results are averaged 
over 50 istances of the network and error bars are negligible. 


5. Results 


In the case of the EMF estimator (7) the free energy (22) becomes 

F = I (l + ^-aRjv.. tanh(/3n) } . (44) 

and the extremum conditions yield the following equations for the order parameters: 


R=1 

a‘^{a — 2 ) + 1 
^ a^(a — 1) 

1 

2a‘^{a — 1 ) 

Inserting the above equations in (25) the error is computed as follows: 


^EMF — 



1 - aM 


N 


TrC-K 


(45) 

(46) 

(47) 


(48) 


We defer a detailed analysis of the hnite a performance of the ML estimator to a 
future publication. Here we are interested in the leading behaviour of the decay of the 
prediction error as a —)■ cx). It is well known that ML estimators are asymptotically 
efficient, i.e. the errors decay at an optimal speed. Hence, our asymptotic result should 
be a yardstick that allows for a comparison of algorithms. The calculation in Appendix 
F shows that for large values of the a parameter this optimal error decays as 


^opt — 


1 1 
fdaa N 


TrC-\ 


(49) 
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Figure 2: Mean squared error of the couplings inferred with the EMF method (red dots) 
for a system of size N=200 with (3 = 1. Results are averaged over 25 istances of the 
network. Error bars are negligible. The red line corresponds to the replica result for 
the EMF prediction error, the blue line to the replica result for the asymptotic optimal 
prediction error. 


Hence, for a —>■ oo, we have 


lim 

a^oo CeMF 


a 

/9(1 — a^) 


(50) 


For small /3, i.e. large stochasticity of the spins, we have a ^ (3 and both algorithms 
decay at the same rate. This can still be seen in hgure 2 for (3 = 1, where the 
EMF algorithms performs close to optimal. For larger (3, the spins behave more 
deterministically and as shown in hgure 3 the EMF algorithm deviates signihcantly 
from optimality. We have also included data points from a simulation of a penalised 
ML estimator, where we have minimised the cost function Eml + ^ 2 ^ numerically 
by a gradient descent algorithm. Note that the penalty term we chose is equivalent 
to the prior and we are thus maximizing the log-posteror. One can see that this type 
of algorithm achieves asymptotic optimality. Finally, with increasing f3 the ratio (50) 
decays to zero. While the decay rate of the EMF algorithm converges to a nonzero 
value (note that for /3 —)■ 00 , we have a the optimal asymptotic error rate 

converges to zero indicating a transition to a faster decay than l/a in the limit. It is 
also interesting to note that for larger f3 simulations of the EMF algorithms show strong 
hnite size effects in N and the error reaches a plateau for increasing a. Hence, we had 
to apply a hnite scaling for the last simulation point in hgure 3 . 
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Figure 3: Mean squared error of the couplings inferred with the EMF method (red dots) 
for a system of size N=200 with ft = 5. Results are averaged over 25 instances of the 
network. The red line corresponds to the replica result for the EMF prediction error, 
the blue line to the replica result for the optimal prediction error. The blue dots are 
results from simulations of a penalised ML algorithm. Error bars are negligible. For 
large values of a, the EMF method displays hnite-size effects (see the red dot at a = 50), 
which are stronger for larger ft. The green dot takes into accout hnite-size corrections, 
and it is obtained as explained in hgure 4. 
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Figure 4: EMF prediction error for hxed a = 50 and /9 = 5 as a function of N. Fitting a 
power law to the data we hnd the asymptotic value valid for large N, which corresponds 
to the green dot in hgure 3. 
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6. Outlook 

It will be interesting to develop and study algorithms which include prior knowledge 
about the couplings to be learnt. This could be done within a Bayesian approach 
where a prior probability density over couplings is specified. In this way one may e.g. 
introduce sparsity. Using a similar replica approach, one could compare the performance 
of different algorithms to that of the Bayes estimator, which is optimal on average over 
teacher networks drawn at random from the prior. A nontrivial question is that of an 
algorithmic realisation of the Bayes predictor. We expect that cavity approaches (TAP 
equations) could be applied to get a tractable approximation which becomes exact in 
the thermodynamic limit. We also expect that one should include explicit knowledge 
of the statistics of the spin correlations into such an approach in order to get optimal 
performance. 
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Appendix 


A. Details of the replica calculation of the free energy 


After some standard manipulations [13-15], the quenched free energy (15) is computed 
as 


F = —Extr, 


q,R,qo 


V 


G{R, g, go) + « 5:/ VtVy- 

O-Q 




2cosh[/3(^/l - ^y)] 


(51) 


log j Vze-''G^O,yF^z+^y)^ ^ 


where G{R, g, go) is the weight of the coupling vectors W which are constrained by the 
order parameters: 


G{R, g, go) 


lim -^^InZcoup: 
n^o dn N ^ 


(52) 


with 


Zcoup = I dW*l[ n <^( 5 ^ W^G,,W* - Nqo) 

' a a ij 

n - NR) ni(E - Nl)- 

a ij a<b ij 


(53) 


We can decouple the integrals over different spins by diagonalising G = UAU~^ and 
transforming to new variables U"^W°‘ —)■ IU“, U"^W* —)■ W* which we give just the same 
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name: 

= J dW*ll n - Nqo) 

a a i 

n - NR) n - Nq). (^ 4 ) 

a i a<b i 


The integration over the couplings and the auxiliary parameters gives rise to the 
following equation for G\ 


G{R,q,qo) = ]r-—— - ^ log(g - go) - 

2 g — go 2 2N 


(55) 


In order to compute the parameters p and Q from the free energy F , we introduce the 
auxiliary variables {rii-,P 2 \ in the partition function Z^oup (54) as follows: 


7 _ 

^ coup 


dW* JJ dW^ dqo dR dq 


_ 1 W*W’ 

g 2*^ VV 


n 


Mo(i:iWtAiW*-Nqo) 


n 


w-{Ai+m)w--NR) TT iq{T.i w-{Ai+m)w^-Nq) 


n 

a<b 


(56) 


By derivatives with respect to {pi, 772 } and taking the limit 771 —)■ 0 ,772 ^ 0 one recovers 

(24). 

B. Derivation of the generating function 

For a Gaussian model without external held we have (yi) = 0, hence q = = 0 

and there is no need to introduce replicas, (absence of spin-glass ordering) and we can 
restrict ourselves to an annealed average. Decoupling the quadratic form in the exponent 
of (33) using correlated Gaussian random vectors with covariance {zz~^)c = B{t), we 
get 


Zt+i{x) = j PJdT/iexp 


--{l + x)y y 


exp 


2 

ax 


2N 


z^z){y'y) 


(X 


10 


jv+i N \ 1 / , 

ds s 2 exp[- (1 + xjsJ ( expf- 


9 

ax 


z^z)s 


(57) 


POO J\J 

cx / ds s^~ exp[——(l + a;)s] \l + a‘^xsB{t)\ 
Jo 2 

r>oo 

.. ' ,2 


iv+i 
ds s 2 exp 


'0 


N 1 

— (1 + x)s — -Trln(/ + a‘^xsB{t)) 


where in the second line we have introduced polar coordinates s = j^y y. We 
compute the dual integral for iV —)■ cx) by Laplace’s method, and use the fact that 
from (34) the maximiser of the integral gives s = ^{y'^y) = St+i^x). Finally from 
— iTrln(/ + afxsBit)) = const + \nZt{a‘^xs) we get the recursion 


St+i{x) = ——St {a‘^xSt+i{x)) 
1 + a; 


(58) 
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C. Independence of the J and B{t) matrices: an example 

To better illustrate the independence of the J and B(t) matrices, let us give an 
example and consider the evaluation of one of the terms needed for the computation of 
limAT^oo jfTrB^{t + 1) (see 29): 

^Tr( JB(t) JT JB{t)J^). (59) 

The only sets of contractions giving nonzero contribution in the large N limit are the 
following two: 

1 ^ I 1 - 2 

= -TV (B(i)) , 

I I 

The contractions involving the pairing of a J with a B{f) vanish, since they involve 
I-1 I-1 

either J ... J (J^ ... ) pairings or crossing lines (resulting in non planar diagrams), 

as shown in the two examples below: 

= 0, ^i:i{JB^{^^JB\t)j'^) = 0. (61) 


D. Fade Approximant 


The so called Fade approximant [18], is a rational function (of a specihed order) whose 
power series expansion agrees with a given power series to the highest possible order. 
Given a rational function of the form 

M ! ( ^ \ 

= X] / ( ^ + X] 

k=0 / \ k=l / 

then R is said to be the Fade approximant to the series 

CxD 

f{x) = 'Y^CkX^ (63) 

k=0 


if the following set equations is satished: 

fi(0) = /(O) 

tT 


dx^ 


R{x] 


X=0 


d^ 

dx^ 


fix] 


A; = 1,..., M + A, 


x=0 


which gives M + A + 1 equations for the unknowns Oq, ..., om and bo,..., b^. 


(64) 

(65) 
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The iterative methods explained is section 4 allows us to calculate the moments Bk 
and Mfc, dehned respectively in (31) and (36), for any given k. As an example, in the 


following we will enumerate the first three moments. 

Bi={l-a^)-^ ( 66 ) 

B2 = {1 - aY\l - aY^ (67) 

S 3 = (1 + 2 a^)(l - a®)-i(l - a^)"^(l - a^)"^ (68) 

Ml = (1 - a^)-! (69) 

M 2 = (2-a^)(l-a2)-2(l-a^)-i (70) 

Afs = (5 + — 4a® + a^®)(l — a^)“"‘(l — a"^)“^(l + + a*). (71) 


F. Asymptotic order parameters for ML estimator 

The free energy for the ML estimator is given by 

q-R^ 


F = —Extr, 


q^R,x,z 


2x 






2cosh[/3{^/l-k^t + f^y)] 


■y + /5 o-(\/^2: + y/qy) - log2 cosh[/3(^/a;2; + ./qy)] 


(72) 


It is possible to show that for a —)■ cxd one can assume that q — ^ 0, x ^ 0 and 

g ^ 1. Expanding the a dependent part of (72) for small ^/x, solving for and hnally 
taking the limit q ^ R?, we obtain 

F ^ -Extrq^R^^ { ^2^ ^ + Rb + J Vy\og2cosh[l3y/qy)]^^ . (73) 

This yields the following asymptotic scaling of order parameters: 


7? ~ 1, a; ~ 


g — ~ 


ab ab 

Inserting the above expressions in the dehnition (25) one obtains (49). 


(74) 
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